{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8229a45d-e5f3-4f86-8b97-e59640d603c0",
   "metadata": {},
   "source": [
    "# Exploring DuckDB’s Python API\n",
    "\n",
    "This notebook contains the code examples from chapter 8 of *Getting Started with DuckDB*."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a812d089-37e3-4cc9-a732-14e5bc3a7d96",
   "metadata": {},
   "source": [
    "## Technical requirements\n",
    "\n",
    "In order to run the examples in this notebook, you'll need to install the Python dependencies for this project. You can do this by running the following command in your terminal when in the root directory of the project. Note that ideally this should be using a Python virtual environment for this project.\n",
    "\n",
    "    pip install -r requirements.txt\n",
    "\n",
    "For complete instructions on how to set up your environment for working through the examples, please consult the *Technical requirements* section of the chapter *Setting up the DuckDB Python Client*."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcaadb1b-4171-4657-be2d-571fedb3b5f8",
   "metadata": {},
   "source": [
    "## Working with the Relational API"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "975793a3-2436-4191-8102-84d32f81bfbb",
   "metadata": {},
   "source": [
    "### Creating relations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff12805c-103b-4d35-bad1-b76bae490b2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import duckdb\n",
    "\n",
    "pi_relation = duckdb.sql(\"SELECT pi() AS pi\") \n",
    "\n",
    "type(pi_relation)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aabf3d14-c744-4e4d-b75f-4670c4d1438b",
   "metadata": {},
   "outputs": [],
   "source": [
    "pi_relation.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f5a20d7-1c4f-43af-8e77-310531793fe8",
   "metadata": {},
   "outputs": [],
   "source": [
    "pi_relation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98828ff8-8857-46fd-921f-f11e5b50b419",
   "metadata": {},
   "source": [
    "#### Creating relations from SQL queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52ee45fa-4e29-4ae0-9c58-e685ecaeb61d",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.sql(\n",
    "    \"\"\" \n",
    "    SELECT * \n",
    "    FROM read_csv('Seattle_Pet_Licenses.csv')\n",
    "    \"\"\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ad66125-184f-4210-a6ae-354a313992e3",
   "metadata": {},
   "source": [
    "#### Creating relations from files "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "564cf52e-7cb7-4c1e-aaf9-ce7adffc15a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_csv_relation = duckdb.read_csv(\"Seattle_Pet_Licenses.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87d9fe24-1c62-45b9-ba60-40bbc0760789",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_csv_relation.types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3c74638-0e5e-4b84-86d2-d3668de7d905",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_csv_relation = duckdb.read_csv(\n",
    "    \"Seattle_Pet_Licenses.csv\",\n",
    "    dtype={\"License Issue Date\": \"DATE\"},\n",
    "    date_format=\"%B %d %Y\",\n",
    ")\n",
    "\n",
    "pets_csv_relation.types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1bbbb3af-06ed-4a4e-b055-e289e531fd19",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_csv_relation.limit(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2655eee1-dd13-4b5d-a435-41bafc7f4f9f",
   "metadata": {},
   "outputs": [],
   "source": [
    "help(duckdb.read_csv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "222c9078-8bf0-43f4-9594-babac63cefd3",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(pets_csv_relation.sql_query())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97b57470-022b-4aef-8565-a51394db7a9e",
   "metadata": {},
   "source": [
    "#### Creating relations from tables "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88b16d20-8395-4e2f-b089-45ff9598cb55",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_csv_relation.to_table(\"seattle_pets_dataset\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2281a79a-d351-4476-8cb4-6aa474171aea",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_table_relation = duckdb.table(\"seattle_pets_dataset\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "905e99ef-b4f0-423f-ab26-49828288a9cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.sql(\"SHOW TABLES\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95bc7dad-d16b-4292-be9a-77d5751e2b9d",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_table_relation = duckdb.table(\"seattle_pets_dataset\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8a1cedd-13d8-4a7f-abc8-f531ba4d1907",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_table_relation.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "083777fe-f8fb-46a1-ae06-a60468b83ce9",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_table_relation.count(\"*\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "429d2c7e-2f85-4651-95cd-06cb6d973fe9",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.sql(\"DROP TABLE seattle_pets_dataset\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d22afd71-8255-4bb8-87f6-3ad32cf992f6",
   "metadata": {},
   "source": [
    "### Querying relations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9ea1272-2f1f-49aa-9172-187efe6b6db5",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_cleaned_relation = duckdb.sql(\n",
    "    \"\"\" \n",
    "    SELECT  \n",
    "        \"License Issue Date\" AS issue_date, \n",
    "        \"Animal's Name\" AS pet_name, \n",
    "        \"Species\" AS species, \n",
    "        \"Primary Breed\" AS breed \n",
    "    FROM pets_csv_relation \n",
    "    \"\"\"\n",
    ")\n",
    "\n",
    "pets_cleaned_relation.limit(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02570d76-920e-4ecc-b7e1-3d1e636f0fb5",
   "metadata": {},
   "outputs": [],
   "source": [
    "min_max_relation = duckdb.sql(\n",
    "    \"\"\"\n",
    "    SELECT min(issue_date), max(issue_date)\n",
    "    FROM pets_cleaned_relation\n",
    "    \"\"\"\n",
    ")\n",
    "\n",
    "min_max_relation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99acac44-7558-4bc9-828f-6c6a8da8495b",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(min_max_relation.explain())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38b6cbc8-e860-45f6-a979-04eba034579a",
   "metadata": {},
   "outputs": [],
   "source": [
    "type(min_max_relation)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c748e6e7-01a1-4528-9aee-b9fd254e147e",
   "metadata": {},
   "source": [
    "### Transformations with relations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc5664c5-2e7b-4b01-a7bc-976f11091fc4",
   "metadata": {},
   "source": [
    "#### Comparing SQL queries and Relational API method calling "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e72ffce7-dab5-4cbd-a025-d14255b7c6a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.sql(\n",
    "    \"\"\"\n",
    "    SELECT *\n",
    "    FROM pets_cleaned_relation\n",
    "    WHERE species = 'Pig'\n",
    "    \"\"\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d656fd6-0188-4bcf-8f96-bd053a47b031",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_cleaned_relation.filter(\"species = 'Pig'\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0620ee76-1aee-4d28-aa37-5fa62e903012",
   "metadata": {},
   "source": [
    "#### The Expression API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52526d31-4dfe-4330-8e14-c0e3d246efa2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from duckdb import ColumnExpression, ConstantExpression\n",
    "\n",
    "species_col = ColumnExpression(\"species\")\n",
    "pig_constant = ConstantExpression(\"Pig\")\n",
    "\n",
    "pets_cleaned_relation.filter(species_col == pig_constant)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "629be5f0-43e8-41d3-b389-4d725d08a82f",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_cleaned_relation.filter(\n",
    "    \"species = 'Cat' AND pet_name = 'Leeloo'\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ec81056-1d85-4514-aab3-4695312cfd27",
   "metadata": {},
   "outputs": [],
   "source": [
    "name_col = ColumnExpression(\"pet_name\")\n",
    "\n",
    "is_cat = species_col == ConstantExpression(\"Cat\")\n",
    "is_leeloo = name_col == ConstantExpression(\"Leeloo\")\n",
    "\n",
    "pets_cleaned_relation.filter(is_cat & is_leeloo)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55ae21e6-39d5-4c3a-8e50-92805882063d",
   "metadata": {},
   "source": [
    "#### Method chaining "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a28354b-4e06-4d96-9a3e-9f6279616ce7",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_cleaned_relation.filter(is_cat).limit(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd210945-63c7-49b0-8839-4aa9deabb0be",
   "metadata": {},
   "outputs": [],
   "source": [
    "num_cat_names_rel = (\n",
    "    pets_cleaned_relation\n",
    "    .filter(is_cat)\n",
    "    .select(\"pet_name\")\n",
    "    .distinct()\n",
    "    .count(\"*\")\n",
    ")\n",
    "\n",
    "num_cat_names_rel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "616cfbed-776f-4f0e-af90-6af75ba2b886",
   "metadata": {},
   "outputs": [],
   "source": [
    "alt_num_cat_names_rel = (\n",
    "    pets_cleaned_relation\n",
    "    .filter(is_cat)\n",
    "    .unique(\"pet_name\")\n",
    "    .count(\"*\")\n",
    ")\n",
    "\n",
    "alt_num_cat_names_rel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "068c2509-44ab-44ac-b2e5-5225f9c2d2e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "num_unique_cats_sql = alt_num_cat_names_rel.sql_query()\n",
    "\n",
    "print(num_unique_cats_sql)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cbf3bc26-aced-40a4-a622-78d3e6161d03",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sqlparse\n",
    "\n",
    "formatted_sql = sqlparse.format(\n",
    "    num_unique_cats_sql,\n",
    "    reindent=True\n",
    ")\n",
    "print(formatted_sql)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a36c534b-a1d3-4112-bcbb-4770e29e06ee",
   "metadata": {},
   "source": [
    "#### Putting it all together"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee57fc4e-1b13-4e62-a37b-d3d56d70e977",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.sql(\n",
    "    \"\"\"\n",
    "    SELECT *, length(pet_name) AS name_length \n",
    "    FROM pets_cleaned_relation \n",
    "    ORDER BY name_length DESC\n",
    "    LIMIT 10\n",
    "    \"\"\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a91866b-53c2-4bd2-8ce9-9a2ccfd49936",
   "metadata": {},
   "outputs": [],
   "source": [
    "from duckdb import (\n",
    "    ColumnExpression,\n",
    "    FunctionExpression,\n",
    "    StarExpression\n",
    ")\n",
    "\n",
    "star = StarExpression()\n",
    "name_col = ColumnExpression(\"pet_name\")\n",
    "name_length_col = (\n",
    "    FunctionExpression(\"length\", name_col)\n",
    "    .alias(\"name_length\")\n",
    ")\n",
    "name_length_sort = ColumnExpression(\"name_length\").desc()\n",
    "\n",
    "longest_names_relation = (\n",
    "    pets_cleaned_relation\n",
    "    .select(star, name_length_col)\n",
    "    .sort(name_length_sort)\n",
    "    .limit(10)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ae3215b-9dd4-4911-a889-51c221185456",
   "metadata": {},
   "source": [
    "### Adopting a querying approach"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7eac076a-e3e1-49f7-acce-16563cae216f",
   "metadata": {},
   "source": [
    "### Writing to disk "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76a1badf-f6f2-4ee4-80ba-79c41c2c2fc4",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_cleaned_relation.write_csv(\"seattle_pets.csv\")\n",
    "\n",
    "pets_cleaned_relation.write_parquet(\"seattle_pets.parquet\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f6f6c11d-dad0-45ad-acb6-e8e3999b2284",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.sql(\n",
    "    \"COPY pets_cleaned_relation TO 'seattle_pets.csv'\"\n",
    ")\n",
    "\n",
    "duckdb.sql(\n",
    "    \"COPY pets_cleaned_relation TO 'seattle_pets.parquet'\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a7ee21fc-b085-450c-a803-f3b2db0c44ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.sql(\n",
    "    \"COPY pets_cleaned_relation TO 'seattle_pets.json'\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ced8157e-291a-4311-9945-8e2e327f008d",
   "metadata": {},
   "source": [
    "### Modifying the database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2fd3cde6-0883-422a-84fe-1c2efed28fc0",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn = duckdb.connect(\"seattle_pets.db\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "390bed90-326d-479e-a90e-0c1571e4f9d4",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.read_parquet(\"seattle_pets.parquet\").to_table(\"pets\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e069a49e-3a9e-46a1-b974-d54f5f54c675",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.sql(\"SHOW TABLES\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9abd960e-ed4d-438c-8dff-94509eff231b",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.table(\"pets\").count(\"*\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6fe4a35c-8bb3-4efa-aee3-cf16487d4a1e",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_dog1 = (\"2024-05-19\", \"Monty\", \"Dog\", \"Border Collie\") \n",
    "\n",
    "conn.table(\"pets\").insert(new_dog1) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40841e9e-bbd6-435a-b64a-c1354305f806",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_dog2 = (\"2024-05-19\", \"Pixie\", \"Dog\", \"Australian Kelpie\") \n",
    "\n",
    "new_dog_rel = conn.values(new_dog2) \n",
    "\n",
    "new_dog_rel.insert_into(\"pets\") "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1499790c-e713-4b33-a677-93ad0d885e31",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.table(\"pets\").filter(\"issue_date = '2024-05-19'\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79a64653-8cbd-42a9-9c01-bfbeb048bbbb",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82fc4ec7-b55b-4745-947e-89f70e23abf0",
   "metadata": {},
   "source": [
    "## Working with the Python DB-API"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1b30b67-3e91-474a-b2ee-f1f033bfc02d",
   "metadata": {},
   "source": [
    "### Connecting to a database "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "723f9775-8fdf-4f22-acd2-fbe407d59683",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn = duckdb.connect()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bacf9d7-cd20-47dd-86f9-fea6bcfcafae",
   "metadata": {},
   "source": [
    "### Querying databases"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2653baa9-f6ea-4d60-acbb-2f430ba17cf5",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.execute(\n",
    "    \"\"\"\n",
    "    CREATE OR REPLACE TABLE seattle_pets AS\n",
    "    SELECT * FROM 'seattle_pets.parquet'\n",
    "    \"\"\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0885d642-e214-42b1-8c66-53d75819422d",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.execute(\"SELECT * FROM seattle_pets\") "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6413d13e-0cc3-4dee-a7f2-7a89ba5e9b31",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.fetchone()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0579f842-5003-4866-b50e-19bf11613204",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.description"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ce98950-9429-4db0-8cf9-c60afb88fd48",
   "metadata": {},
   "outputs": [],
   "source": [
    "[conn.fetchone() for i in range(3)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0cc1387d-181f-45b7-a40e-8178ebcbb622",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.fetchmany(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "977d0a16-36f0-43de-ab34-4ffc380c55d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "rest_rows = conn.fetchall()\n",
    "\n",
    "len(rest_rows)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52afd34c-95bf-4f90-8085-29e554ab6271",
   "metadata": {},
   "source": [
    "### Running SQL queries using Prepared statements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e04ad35-f762-496a-aea8-55371bc9a230",
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime\n",
    "\n",
    "new_pet1 = [\n",
    "    datetime.date.today(),\n",
    "    \"Ned\",\n",
    "    \"Dog\",\n",
    "    \"Border Collie\"\n",
    "]\n",
    "\n",
    "conn.execute(\n",
    "    \"INSERT INTO seattle_pets VALUES (?, ?, ?, ?)\", \n",
    "    parameters=new_pet1\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07c6fe53-d4b3-422b-a5e5-a617a973657f",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_pet2 = {\n",
    "    \"name\": \"Simon\",\n",
    "    \"species\": \"Cat\",\n",
    "    \"breed\": \"Bombay\",\n",
    "    \"issue_date\": datetime.date.today(),\n",
    "}\n",
    "\n",
    "conn.execute(\n",
    "    \"\"\"\n",
    "    INSERT INTO seattle_pets VALUES\n",
    "        ($issue_date, $name, $species, $breed)\n",
    "    \"\"\",\n",
    "    new_pet2\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "77b66a11-8563-4763-82a5-011ce47b80c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.execute(\n",
    "    \"\"\" \n",
    "    SELECT * \n",
    "    FROM seattle_pets \n",
    "    WHERE issue_date = ?; \n",
    "    \"\"\",\n",
    "    [datetime.date.today()],\n",
    ").fetchall()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ebb6ce5-ff64-441e-9ea7-52afb97d5efd",
   "metadata": {},
   "source": [
    "### Writing to disk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "757bf125-4e7a-4d71-827f-b3c1e6c92deb",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.execute(\n",
    "    \"COPY seattle_pets TO 'seattle_pets_updates.csv'\"\n",
    ") \n",
    "\n",
    "conn.execute(\n",
    "    \"COPY seattle_pets TO 'seattle_pets_updates.parquet'\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f156982f-e6bf-4b24-a4ed-581f4fa1b650",
   "metadata": {},
   "source": [
    "### Closing the database connection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "164d8957-1ac0-4905-a17e-5aaef5d5c99d",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cd8be65-f86e-4475-82c2-4df73644202d",
   "metadata": {},
   "source": [
    "### Database cursors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a2b46d76-28b8-4b99-8436-051857d04a96",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn = duckdb.connect()\n",
    "\n",
    "new_conn = conn.cursor()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8e98f73-ee31-4560-a508-231734fdfbaf",
   "metadata": {},
   "source": [
    "## Integration with Python packages and language features "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d9148e7-3d82-456c-9098-36d67828592e",
   "metadata": {},
   "source": [
    "### Consuming Python data structures"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08cca0ac-8854-409b-a3ab-26ecd05ece24",
   "metadata": {},
   "source": [
    "#### Querying Python objects via replacement scans "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f6c4ac9-3870-4ce3-b075-e4a129359b7e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd  \n",
    "\n",
    "pets_df = pd.read_parquet(\"seattle_pets.parquet\")\n",
    "\n",
    "duckdb.sql(\"SELECT * FROM pets_df USING SAMPLE 1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a836da42-b2cf-45cf-a457-7865c6649538",
   "metadata": {},
   "source": [
    "#### Registering objects as virtual tables "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b123215-5d55-4454-ab34-e53cd7c8831e",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_dict = {\n",
    "    \"seattle\": pd.read_parquet(\"seattle_pets.parquet\")\n",
    "}\n",
    "\n",
    "duckdb.register(\"pets_view\", pets_dict[\"seattle\"])\n",
    "\n",
    "duckdb.sql(\"SELECT * FROM pets_view USING SAMPLE 1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "342bf97b-effa-469a-a34b-034f7fa4537b",
   "metadata": {},
   "source": [
    "#### Creating tables from objects "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98657e52-cb99-4f2d-9ea0-4c9af0f1244f",
   "metadata": {},
   "outputs": [],
   "source": [
    "pets_df = pd.read_parquet(\"seattle_pets.parquet\")\n",
    "\n",
    "duckdb.sql(\n",
    "    \"\"\"\n",
    "    CREATE OR REPLACE TABLE pets_table_from_df AS\n",
    "    SELECT * FROM pets_df\n",
    "    \"\"\"\n",
    ")\n",
    "\n",
    "duckdb.sql(\"SELECT * FROM pets_table_from_df USING SAMPLE 1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd1e4185-1f2c-44dd-a25d-42fc0408774d",
   "metadata": {},
   "source": [
    "### Converting query results"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c322198-5d3f-4abb-b8f5-3dd5e3b332cb",
   "metadata": {},
   "source": [
    "#### Converting to dataframes "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bb3ce6f-3ff4-4bcd-b0b2-3e6f5f2b6013",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn = duckdb.connect()\n",
    "seattle_pets = conn.from_parquet(\"seattle_pets.parquet\")\n",
    "\n",
    "pandas_df = seattle_pets.df()\n",
    "pandas_df[\n",
    "    pandas_df[\"species\"] == \"Dog\"\n",
    "].value_counts(\"breed\")[:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b63ac3dc-edb1-495a-9d28-3965c190f6b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import polars as pl\n",
    "\n",
    "polars_df = seattle_pets.pl()\n",
    "polars_df.filter(\n",
    "    pl.col(\"species\") == \"Cat\"\n",
    ")[\"breed\"].value_counts(sort=True)[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98ca5b37-01bc-44a1-86e8-a1acd06f1831",
   "metadata": {},
   "source": [
    "#### Converting to Arrow tables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e6ba1dd3-8396-4af1-aada-53b4b6d93825",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn = duckdb.connect()\n",
    "\n",
    "conn.execute(\"SELECT * FROM 'seattle_pets.parquet'\")\n",
    "\n",
    "pets_table = conn.arrow()\n",
    "\n",
    "pets_table.schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "215be285-2836-48c7-bd22-b3509c12cc0a",
   "metadata": {},
   "source": [
    "### Data types: from Python to DuckDB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e6f62f08-1a41-4b37-917c-3936b85dab05",
   "metadata": {},
   "outputs": [],
   "source": [
    "varchar_type = duckdb.typing.VARCHAR \n",
    "\n",
    "bigint_type = duckdb.typing.BIGINT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47ca6c3a-6136-40bf-999e-3355949628cd",
   "metadata": {},
   "outputs": [],
   "source": [
    "varchar_type = duckdb.typing.DuckDBPyType(str) \n",
    "\n",
    "bigint_type = duckdb.typing.DuckDBPyType(int)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a10741a1-ec72-4a80-a86c-3f8f65d6e2e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.values(\n",
    "    [\n",
    "        10,\n",
    "        1_000_000,\n",
    "        0.95,\n",
    "        \"hello string\",\n",
    "        b\"hello bytes\",\n",
    "        True,\n",
    "        datetime.date.today(),\n",
    "        None,\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1adb6f9b-7907-49e0-bdb0-82c420d2ba34",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.values(\n",
    "    [\n",
    "        (1, 2), \n",
    "        [\"hello\", \"world\"],\n",
    "        {\"key1\": 10, \"key2\": \"quack!\"}\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb3c131a-d008-4dfd-8f37-c5e948ed1e3c",
   "metadata": {},
   "source": [
    "### User-defined functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f41984cc-fe5e-4a73-8dac-0e26c1b940ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "import emoji\n",
    "\n",
    "def emojify(species):\n",
    "    \"\"\"Converts a string into a single emoji.\"\"\"\n",
    "    emoji_str = emoji.emojize(f\":{species.lower()}:\")\n",
    "    if emoji.is_emoji(emoji_str):\n",
    "        return emoji_str\n",
    "    return None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4d0b4954-3ec3-43a7-8bac-82b0b984705f",
   "metadata": {},
   "outputs": [],
   "source": [
    "emojify(\"goat\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1fa5c784-be8d-46ec-aaaf-67e7297fd1f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.create_function(\n",
    "    \"emojify\", \n",
    "    emojify, \n",
    "    [duckdb.typing.VARCHAR],\n",
    "    duckdb.typing.VARCHAR\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33d6e694-db7b-4b69-95f6-808969173434",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.sql(\n",
    "    \"\"\" \n",
    "    SELECT *, emojify(species) AS emoji \n",
    "    FROM 'seattle_pets_updates.parquet' \n",
    "    USING SAMPLE 10 \n",
    "    \"\"\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b6016cc-dd6b-479a-9e5d-548cf877586a",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.remove_function(\"emojify\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fab67e91-5424-415d-a853-4b7351a1a574",
   "metadata": {},
   "outputs": [],
   "source": [
    "duckdb.create_function(\"emojify\", emojify, [str], str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4becdbf1-f382-4d43-b092-5482c8e1e0a0",
   "metadata": {},
   "outputs": [],
   "source": [
    "def emojify(species: str) -> str:\n",
    "    \"\"\"Converts a string into a single emoji.\"\"\"\n",
    "    emoji_str = emoji.emojize(f\":{species.lower()}:\")\n",
    "    if emoji.is_emoji(emoji_str):\n",
    "        return emoji_str\n",
    "    return None\n",
    "\n",
    "duckdb.remove_function(\"emojify\")\n",
    "\n",
    "duckdb.create_function(\"emojify\", emojify)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9cca22d-72ca-4b99-86a5-4de36574a0ae",
   "metadata": {},
   "source": [
    "### Handling exceptions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c79058ff-ff2a-48fa-8b98-75dd85ab7078",
   "metadata": {},
   "outputs": [],
   "source": [
    "from duckdb import ConversionException\n",
    "\n",
    "try:\n",
    "    duckdb.execute(\"SELECT '5,000'::INTEGER\").fetchall()\n",
    "except ConversionException as error:\n",
    "    print(error)\n",
    "    # handle exception..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a835483-4509-410a-abcc-ed6503ef1121",
   "metadata": {},
   "outputs": [],
   "source": [
    "from duckdb import CatalogException\n",
    "\n",
    "try:\n",
    "    duckdb.sql(\"SELECT * from imaginary_table\")\n",
    "except CatalogException as error:\n",
    "    print(error)\n",
    "    # handle exception..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5bc751a-a341-40cb-ad3c-6cfacbcc7d6e",
   "metadata": {},
   "source": [
    "## Summary"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
